Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 17 de 17
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
J Neural Eng ; 21(1)2024 02 07.
Artigo em Inglês | MEDLINE | ID: mdl-38277701

RESUMO

Objective.Electroencephalography (EEG) is a widely used technology for recording brain activity in brain-computer interface (BCI) research, where understanding the encoding-decoding relationship between stimuli and neural responses is a fundamental challenge. Recently, there is a growing interest in encoding-decoding natural stimuli in a single-trial setting, as opposed to traditional BCI literature where multi-trial presentations of synthetic stimuli are commonplace. While EEG responses to natural speech have been extensively studied, such stimulus-following EEG responses to natural video footage remain underexplored.Approach.We collect a new EEG dataset with subjects passively viewing a film clip and extract a few video features that have been found to be temporally correlated with EEG signals. However, our analysis reveals that these correlations are mainly driven by shot cuts in the video. To avoid the confounds related to shot cuts, we construct another EEG dataset with natural single-shot videos as stimuli and propose a new set of object-based features.Main results.We demonstrate that previous video features lack robustness in capturing the coupling with EEG signals in the absence of shot cuts, and that the proposed object-based features exhibit significantly higher correlations. Furthermore, we show that the correlations obtained with these proposed features are not dominantly driven by eye movements. Additionally, we quantitatively verify the superiority of the proposed features in a match-mismatch task. Finally, we evaluate to what extent these proposed features explain the variance in coherent stimulus responses across subjects.Significance.This work provides valuable insights into feature design for video-EEG analysis and paves the way for applications such as visual attention decoding.


Assuntos
Interfaces Cérebro-Computador , Eletroencefalografia , Humanos , Eletroencefalografia/métodos , Movimentos Oculares , Algoritmos
2.
Neural Netw ; 161: 659-669, 2023 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-36841037

RESUMO

In this paper we describe the design and the ideas motivating a new Continual Learning benchmark for Autonomous Driving (CLAD), that focuses on the problems of object classification and object detection. The benchmark utilises SODA10M, a recently released large-scale dataset that concerns autonomous driving related problems. First, we review and discuss existing continual learning benchmarks, how they are related, and show that most are extreme cases of continual learning. To this end, we survey the benchmarks used in continual learning papers at three highly ranked computer vision conferences. Next, we introduce CLAD-C, an online classification benchmark realised through a chronological data stream that poses both class and domain incremental challenges; and CLAD-D, a domain incremental continual object detection benchmark. We examine the inherent difficulties and challenges posed by the benchmark, through a survey of the techniques and methods used by the top-3 participants in a CLAD-challenge workshop at ICCV 2021. We conclude with possible pathways to improve the current continual learning state of the art, and which directions we deem promising for future research.


Assuntos
Condução de Veículo , Benchmarking , Humanos , Aprendizagem
3.
IEEE Trans Neural Netw Learn Syst ; 34(10): 7271-7285, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-35073270

RESUMO

Discovering novel visual categories from a set of unlabeled images is a crucial and essential capability for intelligent vision systems since it enables them to automatically learn new concepts with no need for human-annotated supervision anymore. To tackle this problem, existing approaches first pretrain a neural network with a set of labeled images and then fine-tune the network to cluster unlabeled images into a few categorical groups. However, their unified feature representation hits a tradeoff bottleneck between feature preservation on labeled data and feature adaptation on unlabeled data. To circumvent this bottleneck, we propose a residual-tuning approach, which estimates a new residual feature from the pretrained network and adds it with a previous basic feature to compute the clustering objective together. Our disentangled representation approach facilitates adjusting visual representations for unlabeled images and overcoming forgetting old knowledge acquired from labeled images, with no need of replaying the labeled images again. In addition, residual-tuning is an efficient solution, adding few parameters and consuming modest training time. Our results on three common benchmarks show consistent and considerable gains over other state-of-the-art methods, and further reduce the performance gap to the fully supervised learning setup. Moreover, we explore two extended scenarios, including using fewer labeled classes and continually discovering more unlabeled sets, where the results further signify the advantages and effectiveness of our residual-tuning approach against previous approaches. Our code is available at https://github.com/liuyudut/ResTune.

4.
IEEE Trans Pattern Anal Mach Intell ; 45(2): 2400-2411, 2023 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-35349431

RESUMO

SegBlocks reduces the computational cost of existing neural networks, by dynamically adjusting the processing resolution of image regions based on their complexity. Our method splits an image into blocks and downsamples blocks of low complexity, reducing the number of operations and memory consumption. A lightweight policy network, selecting the complex regions, is trained using reinforcement learning. In addition, we introduce several modules implemented in CUDA to process images in blocks. Most important, our novel BlockPad module prevents the feature discontinuities at block borders of which existing methods suffer, while keeping memory consumption under control. Our experiments on Cityscapes, Camvid and Mapillary Vistas datasets for semantic segmentation show that dynamically processing images offers a better accuracy versus complexity trade-off compared to static baselines of similar complexity. For instance, our method reduces the number of floating-point operations of SwiftNet-RN18 by 60% and increases the inference speed by 50%, with only 0.3% decrease in mIoU accuracy on Cityscapes.

5.
Nat Mach Intell ; 4(12): 1185-1197, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36567959

RESUMO

Incrementally learning new information from a non-stationary stream of data, referred to as 'continual learning', is a key feature of natural intelligence, but a challenging problem for deep neural networks. In recent years, numerous deep learning methods for continual learning have been proposed, but comparing their performances is difficult due to the lack of a common framework. To help address this, we describe three fundamental types, or 'scenarios', of continual learning: task-incremental, domain-incremental and class-incremental learning. Each of these scenarios has its own set of challenges. To illustrate this, we provide a comprehensive empirical comparison of currently used continual learning strategies, by performing the Split MNIST and Split CIFAR-100 protocols according to each scenario. We demonstrate substantial differences between the three scenarios in terms of difficulty and in terms of the effectiveness of different strategies. The proposed categorization aims to structure the continual learning field, by forming a key foundation for clearly defining benchmark problems.

6.
IEEE Trans Pattern Anal Mach Intell ; 44(7): 3366-3385, 2022 07.
Artigo em Inglês | MEDLINE | ID: mdl-33544669

RESUMO

Artificial neural networks thrive in solving the classification problem for a particular rigid task, acquiring knowledge through generalized learning behaviour from a distinct training phase. The resulting network resembles a static entity of knowledge, with endeavours to extend this knowledge without targeting the original task resulting in a catastrophic forgetting. Continual learning shifts this paradigm towards networks that can continually accumulate knowledge over different tasks without the need to retrain from scratch. We focus on task incremental classification, where tasks arrive sequentially and are delineated by clear boundaries. Our main contributions concern: (1) a taxonomy and extensive overview of the state-of-the-art; (2) a novel framework to continually determine the stability-plasticity trade-off of the continual learner; (3) a comprehensive experimental comparison of 11 state-of-the-art continual learning methods; and (4) baselines. We empirically scrutinize method strengths and weaknesses on three benchmarks, considering Tiny Imagenet and large-scale unbalanced iNaturalist and a sequence of recognition datasets. We study the influence of model capacity, weight decay and dropout regularization, and the order in which the tasks are presented, and qualitatively compare methods in terms of required memory, computation time, and storage.


Assuntos
Algoritmos , Aprendizagem , Redes Neurais de Computação
7.
Artigo em Inglês | MEDLINE | ID: mdl-32142437

RESUMO

Zero-shot learning (ZSL) has attracted significant attention due to its capabilities of classifying new images from unseen classes. To perform the classification task for ZSL, learning visual and semantic embeddings has been the main research approach in existing literature. At the same time, generating complementary explanations to justify the classification decision has remained largely unexplored. In this paper, we propose to address a new and challenging task, namely explainable zero-shot learning (XZSL), which aims to generate visual and textual explanations to support the classification decision. To accomplish this task, we build a novel Deep Multi-modal Explanation (DME) model that incorporates a joint visual-attribute embedding module and a multi-channel explanation module in an end-to-end fashion. In contrast to existing ZSL approaches, our visual-attribute embedding is associated not only with the decision, but also with new visual and textual explanations. For visual explanations, we first capture several attribute activation maps (AAM) and then merge them into a class activation map (CAM) that visually infers which region of an image is relevant to the class. Textual explanations are generated from the multi-channel explanation module, jointly integrating three long short-term memory models (LSTMs) each of which is conditioned on a different feature representation. Additionally, we suggest that the DME model can retain explanatory consistency for similar instances and explanatory diversity for diverse instances. We conduct qualitative and quantitative experiments to assess the model for ZSL classification and explanation. Specifically, the ablation studies verify the effectiveness of the components in our model. Our results on three well-known datasets are competitive with prior approaches. More importantly, the joint training of our embedding and explanation modules demonstrates mutual performance improvements between ZSL classification and explanation. We shed more light on DME to analyze and diagnose its advantages and limitations.

8.
IEEE Trans Pattern Anal Mach Intell ; 42(11): 2825-2841, 2020 11.
Artigo em Inglês | MEDLINE | ID: mdl-31094682

RESUMO

In this paper, a novel benchmark is introduced for evaluating local image descriptors. We demonstrate limitations of the commonly used datasets and evaluation protocols, that lead to ambiguities and contradictory results in the literature. Furthermore, these benchmarks are nearly saturated due to the recent improvements in local descriptors obtained by learning from large annotated datasets. To address these issues, we introduce a new large dataset suitable for training and testing modern descriptors, together with strictly defined evaluation protocols in several tasks such as matching, retrieval and verification. This allows for more realistic, thus more reliable comparisons in different application scenarios. We evaluate the performance of several state-of-the-art descriptors and analyse their properties. We show that a simple normalisation of traditional hand-crafted descriptors is able to boost their performance to the level of deep learning based descriptors once realistic benchmarks are considered. Additionally we specify a protocol for learning and evaluating using cross validation. We show that when training state-of-the-art descriptors on this dataset, the traditional verification task is almost entirely saturated.


Assuntos
Aprendizado Profundo , Processamento de Imagem Assistida por Computador/métodos , Algoritmos
9.
IEEE Trans Pattern Anal Mach Intell ; 40(8): 1932-1947, 2018 08.
Artigo em Inglês | MEDLINE | ID: mdl-28841552

RESUMO

In this paper, we present a method that estimates reflectance and illumination information from a single image depicting a single-material specular object from a given class under natural illumination. We follow a data-driven, learning-based approach trained on a very large dataset, but in contrast to earlier work we do not assume one or more components (shape, reflectance, or illumination) to be known. We propose a two-step approach, where we first estimate the object's reflectance map, and then further decompose it into reflectance and illumination. For the first step, we introduce a Convolutional Neural Network (CNN) that directly predicts a reflectance map from the input image itself, as well as an indirect scheme that uses additional supervision, first estimating surface orientation and afterwards inferring the reflectance map using a learning-based sparse data interpolation technique. For the second step, we suggest a CNN architecture to reconstruct both Phong reflectance parameters and high-resolution spherical illumination maps from the reflectance map. We also propose new datasets to train these CNNs. We demonstrate the effectiveness of our approach for both steps by extensive quantitative and qualitative evaluation in both synthetic and real data as well as through numerous applications, that show improvements over the state-of-the-art.

10.
IEEE Trans Pattern Anal Mach Intell ; 39(4): 773-787, 2017 04.
Artigo em Inglês | MEDLINE | ID: mdl-28278449

RESUMO

We propose a function-based temporal pooling method that captures the latent structure of the video sequence data - e.g., how frame-level features evolve over time in a video. We show how the parameters of a function that has been fit to the video data can serve as a robust new video representation. As a specific example, we learn a pooling function via ranking machines. By learning to rank the frame-level features of a video in chronological order, we obtain a new representation that captures the video-wide temporal dynamics of a video, suitable for action recognition. Other than ranking functions, we explore different parametric models that could also explain the temporal changes in videos. The proposed functional pooling methods, and rank pooling in particular, is easy to interpret and implement, fast to compute and effective in recognizing a wide variety of actions. We evaluate our method on various benchmarks for generic action, fine-grained action and gesture recognition. Results show that rank pooling brings an absolute improvement of 7-10 average pooling baseline. At the same time, rank pooling is compatible with and complementary to several appearance and local motion based methods and features, such as improved trajectories and deep learning features.

11.
IEEE Trans Pattern Anal Mach Intell ; 39(8): 1576-1590, 2017 08.
Artigo em Inglês | MEDLINE | ID: mdl-27541489

RESUMO

Taking an image of an object is at its core a lossy process. The rich information about the three-dimensional structure of the world is flattened to an image plane and decisions such as viewpoint and camera parameters are final and not easily revertible. As a consequence, possibilities of changing viewpoint are limited. Given a single image depicting an object, novel-view synthesis is the task of generating new images that render the object from a different viewpoint than the one given. The main difficulty is to synthesize the parts that are disoccluded; disocclusion occurs when parts of an object are hidden by the object itself under a specific viewpoint. In this work, we show how to improve novel-view synthesis by making use of the correlations observed in 3D models and applying them to new image instances. We propose a technique to use the structural information extracted from a 3D model that matches the image object in terms of viewpoint and shape. For the latter part, we propose an efficient 2D-to-3D alignment method that associates precisely the image appearance with the 3D model geometry with minimal user interaction. Our technique is able to simulate plausible viewpoint changes for a variety of object classes within seconds. Additionally, we show that our synthesized images can be used as additional training data that improves the performance of standard object detectors.

12.
IEEE Trans Image Process ; 25(5): 2259-74, 2016 May.
Artigo em Inglês | MEDLINE | ID: mdl-27458637

RESUMO

This paper proposes a generic methodology for the semi-automatic generation of reliable position annotations for evaluating multi-camera people-trackers on large video data sets. Most of the annotation data are automatically computed, by estimating a consensus tracking result from multiple existing trackers and people detectors and classifying it as either reliable or not. A small subset of the data, composed of tracks with insufficient reliability, is verified by a human using a simple binary decision task, a process faster than marking the correct person position. The proposed framework is generic and can handle additional trackers. We present results on a data set of $sim 6$ h captured by 4 cameras, featuring a person in a holiday flat, performing activities such as walking, cooking, eating, cleaning, and watching TV. When aiming for a tracking accuracy of 60 cm, 80% of all video frames are automatically annotated. The annotations for the remaining 20% of the frames were added after human verification of an automatically selected subset of data. This involved $sim 2.4$ h of manual labor. According to a subsequent comprehensive visual inspection to judge the annotation procedure, we found 99% of the automatically annotated frames to be correct. We provide guidelines on how to apply the proposed methodology to new data sets. We also provide an exploratory study for the multi-target case, applied on the existing and new benchmark video sequences.


Assuntos
Curadoria de Dados/métodos , Atividades Humanas/classificação , Processamento de Imagem Assistida por Computador/métodos , Gravação em Vídeo/métodos , Algoritmos , Humanos
13.
IEEE Trans Image Process ; 25(5): 2259-2274, 2016 05.
Artigo em Inglês | MEDLINE | ID: mdl-28113804

RESUMO

This paper proposes a generic methodology for semi-automatic generation of reliable position annotations for evaluating multi-camera people-trackers on large video datasets. Most of the annotation data is computed automatically, by estimating a consensus tracking result from multiple existing trackers and people detectors and classifying it as either reliable or not. A small subset of the data, composed of tracks with insufficient reliability is verified by a human using a simple binary decision task, a process faster than marking the correct person position. The proposed framework is generic and can handle additional trackers. We present results on a dataset of approximately 6 hours captured by 4 cameras, featuring a person in a holiday flat, performing activities such as walking, cooking, eating, cleaning, and watching TV. When aiming for a tracking accuracy of 60cm, 80% of all video frames are automatically annotated. The annotations for the remaining 20% of the frames were added after human verification of an automatically selected subset of data. This involved about 2.4 hours of manual labour. According to a subsequent comprehensive visual inspection to judge the annotation procedure, we found 99% of the automatically annotated frames to be correct. We provide guidelines on how to apply the proposed methodology to new datasets. We also provide an exploratory study for the multi-target case, applied on existing and new benchmark video sequences.

14.
Artigo em Inglês | MEDLINE | ID: mdl-26737890

RESUMO

More than thirty percent of persons over 65 years fall at least once a year and are often not able to get up again. The lack of timely aid after such a fall incident can lead to severe complications. This timely aid can however be assured by a camera-based fall detection system triggering an alarm when a fall occurs. Most algorithms described in literature use the biggest object detected using background subtraction to extract the fall features. In this paper we compare the performance of our state-of-the-art fall detection algorithm when using only background subtraction, when using a particle filter to track the person and a hybrid method in which the particle filter is only used to enhance the background subtraction and not for the feature extraction. We tested this using our simulation data set containing reenactments of real-life falls. This comparison shows that this hybrid method significantly increases the sensitivity and robustness of the fall detection algorithm resulting in a sensitivity of 76.1% and a PPV of 41.2%.


Assuntos
Acidentes por Quedas , Filtração/instrumentação , Fotografação/instrumentação , Idoso , Algoritmos , Humanos
15.
Artigo em Inglês | MEDLINE | ID: mdl-23366916

RESUMO

In this study we introduce a method for detecting myoclonic jerks during the night with video. Using video instead of the traditional method of using EEG-electrodes, permits patients to sleep without any attached sensors. This improves the comfort during sleep and it makes long term home monitoring possible. The algorithm for the detection of the seizures is based on spatio-temporal interest points (STIPs), proposed by Ivan Laptev, which is the state-of-the-art in action recognition.We applied this algorithm on a group of patients suffering from myoclonic jerks. With an optimal parameter setting this resulted in a sensitivity of over 75% and a PPV of over 85%, on the patients' combined data.


Assuntos
Pontos de Referência Anatômicos/patologia , Epilepsias Mioclônicas/diagnóstico , Imageamento Tridimensional/métodos , Mioclonia/diagnóstico , Reconhecimento Automatizado de Padrão/métodos , Polissonografia/métodos , Gravação em Vídeo/métodos , Criança , Pré-Escolar , Epilepsias Mioclônicas/fisiopatologia , Feminino , Humanos , Masculino , Monitorização Ambulatorial/métodos , Mioclonia/fisiopatologia , Fotografação/métodos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
16.
IEEE Trans Pattern Anal Mach Intell ; 32(10): 1809-21, 2010 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-20724758

RESUMO

Object matching is a fundamental operation in data analysis. It typically requires the definition of a similarity measure between the classes of objects to be matched. Instead, we develop an approach which is able to perform matching by requiring a similarity measure only within each of the classes. This is achieved by maximizing the dependency between matched pairs of observations by means of the Hilbert-Schmidt Independence Criterion. This problem can be cast as one of maximizing a quadratic assignment problem with special structure and we present a simple algorithm for finding a locally optimal solution.

17.
IEEE Trans Pattern Anal Mach Intell ; 29(9): 1575-89, 2007 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-17627045

RESUMO

This paper presents a novel approach for visual scene modeling and classification, investigating the combined use of text modeling methods and local invariant features. Our work attempts to elucidate (1) whether a text-like bag-of-visterms representation (histogram of quantized local visual features) is suitable for scene (rather than object) classification, (2) whether some analogies between discrete scene representations and text documents exist, and (3) whether unsupervised, latent space models can be used both as feature extractors for the classification task and to discover patterns of visual co-occurrence. Using several data sets, we validate our approach, presenting and discussing experiments on each of these issues. We first show, with extensive experiments on binary and multi-class scene classification tasks using a 9,500-image data set, that the bag-of-visterms representation consistently outperforms classical scene classification approaches. In other data sets we show that our approach competes with or outperforms other recent, more complex, methods. We also show that Probabilistic Latent Semantic Analysis (PLSA) generates a compact scene representation, discriminative for accurate classification, and more robust than the bag-of-visterms representation when less labeled training data is available. Finally, through aspect-based image ranking experiments, we show the ability of PLSA to automatically extract visually meaningful scene patterns, making such representation useful for browsing image collections.


Assuntos
Algoritmos , Inteligência Artificial , Bases de Dados Factuais , Interpretação de Imagem Assistida por Computador/métodos , Armazenamento e Recuperação da Informação/métodos , Reconhecimento Automatizado de Padrão/métodos , Aumento da Imagem/métodos , Processamento de Linguagem Natural , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...